Toward Selectivity Based Keyword Extraction for Croatian News

نویسندگان

Slobodan Beliga

Ana Mestrovic

Sanda Martincic-Ipsic

چکیده

Our approach proposes a novel network measure the node selectivity for the task of keyword extraction. The node selectivity is defined as the average strength of the node. Firstly, we show that selectivitybased keyword extraction slightly outperforms the extraction based on the standard centrality measures: in-degree, out-degree, betweenness, and closeness. Furthermore, from the data set of Croatian news we extract keyword candidates and expand extracted nodes to word-tuples ranked with the highest in/out selectivity values. The obtained sets are evaluated on manually annotated keywords: for the set of extracted keyword candidates the average F1 score is 24.63%, and the average F2 score is 21.19%; for the exacted word-tuples candidates the average F1 score is 25.9% and the average F2 score is 24.47%. Selectivity-based extraction does not require linguistic knowledge while it is purely derived from statistical and structural information of the network.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword extraction: a review of methods and approaches

Paper presents a survey of methods and approaches for keyword extraction task. In addition to the systematization of methods, the paper gathers a comprehensive review of existing research. Related work on keyword extraction is elaborated for supervised and unsupervised methods, with special emphasis on graphbased methods as well as Croatian keyword extraction. Selectivity-based keyword extracti...

متن کامل

Toward Network-based Keyword Extraction from Multitopic Web Documents

In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We ...

متن کامل

Toward Network-based Keyword Extraction from Multitopic Web Documents

متن کامل

Feature Extraction and Clustering of Croatian News Sources

This paper presents the design of a system for feature extraction and classification of news articles from Croatian news sources. An overview of supervised and unsupervised text classification and clustering machine learning techniques is presented. The techniques described are those most widely used for text classification tasks. The paper discusses a number of issues particular to text classi...

متن کامل

Keyword extraction of radio news using domain identification based on categories of an encyclopedia

In this paper, we propose a keyword extraction method for dictation of radio news which consists of several domains. In our method, newspaper articles which are automatically classi ed into suitable domains are used in order to calculate feature vectors. The feature vectors show term-domain interdependence and are used for selecting a suitable domain of each part of radio news.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1407.4723 شماره

صفحات -

تاریخ انتشار 2014

Toward Selectivity Based Keyword Extraction for Croatian News

نویسندگان

چکیده

منابع مشابه

Keyword extraction: a review of methods and approaches

Toward Network-based Keyword Extraction from Multitopic Web Documents

Toward Network-based Keyword Extraction from Multitopic Web Documents

Feature Extraction and Clustering of Croatian News Sources

Keyword extraction of radio news using domain identification based on categories of an encyclopedia

عنوان ژورنال:

اشتراک گذاری